Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve weights docstrings #757

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Improve weights docstrings #757

wants to merge 3 commits into from

Conversation

nalimilan
Copy link
Member

AnalyticWeights have a precise definition, on which we rely in several functions.
Also make docstrings more consistent across types.

`AnalyticWeights` have a precise definition, on which we rely in several functions.
Also make docstrings more consistent across types.
@nalimilan nalimilan requested a review from rofinn January 18, 2022 21:36
@nalimilan nalimilan mentioned this pull request Jan 18, 2022
@ParadaCarleton
Copy link
Contributor

I think we should make an important note for both AnalyticWeights and FrequencyWeights that the scale of the weights matters, i.e. f(x, weights) will usually not equal f(x, 2 .* weights).

It would also help to be very specific about the following, to avoid people making the mistake of trying to use normalized weights:
For AnalyticWeights, w[i] must be equal to 1/var(x[i]), and var(x[i]) must be known ahead of time.
For FrequencyWeights, w[i] must be equal to the number of observations for x[i].

I believe the docstring for ProbabilityWeights is already clear enough, because probability weights are scale-invariant. (For now; in theory we might want to make a distinction between self-normalized and unnormalized weights in the future, because unnormalized weights can give unbiased estimators, but ATM we only use self-normalized weights).

Copy link
Member

@rofinn rofinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We already mentioned inverse variance weights, so maybe rewording that docstring to be less redundant would be good?

src/weights.jl Outdated Show resolved Hide resolved
src/weights.jl Outdated
Analytic weights describe a non-random relative importance (usually between 0 and 1)
for each observation. These weights may also be referred to as reliability weights,
Analytic weights represent the inverse of the variance for each case.
These weights may also be referred to as reliability weights,
precision weights or inverse variance weights. These are typically used when the observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
precision weights or inverse variance weights. These are typically used when the observations
precision weights or regression weights. These are typically used when the observations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the expression "regression weights" is used commonly? I'm afraid it could confuse users, as you can use any kind of weights in regression (e.g. Stata supports all three types).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SAS uses that term for them, but I don't know how common it is elsewhere. I'm also fine with just removing the redundant inverse variance and only listing two other common names (e.g., reliability weights, precision weights).

https://blogs.sas.com/content/iml/2017/10/02/weight-variables-in-statistics-sas.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's do that.

src/weights.jl Show resolved Hide resolved
@nalimilan
Copy link
Member Author

I think we should make an important note for both AnalyticWeights and FrequencyWeights that the scale of the weights matters, i.e. f(x, weights) will usually not equal f(x, 2 .* weights).

It would also help to be very specific about the following, to avoid people making the mistake of trying to use normalized weights: For AnalyticWeights, w[i] must be equal to 1/var(x[i]), and var(x[i]) must be known ahead of time. For FrequencyWeights, w[i] must be equal to the number of observations for x[i].

I've added mentions regarding scale-invariance of frequency and probability weights. For analytic weights I'd rather wait until #758 is settled as we may want to adjust the definition a bit (maybe in the next breaking release). AFAICT currently they are scale-invariant, right?

@nalimilan
Copy link
Member Author

I've updated the description of analytic weights in the light of #758. Does that sound correct?

@ParadaCarleton
Copy link
Contributor

I've updated the description of analytic weights in the light of #758. Does that sound correct?

I think we should hold off until #758 is resolved.

@ParadaCarleton
Copy link
Contributor

ParadaCarleton commented Jan 27, 2022

Oh, brief note that I think could be useful for users -- currently, all our methods for ProbabilityWeights normalize the weights before calculating an estimator. This is probably the best default, but sometimes it's useful to use the Hansen-Hurwitz estimators for means, variances, etc.; these estimators use the unnormalized weights, which makes them unbiased (but usually results in higher variance). The behavior of these estimators can be replicated by setting sum=1, in which case the weights won't be normalized.

@nalimilan
Copy link
Member Author

@ParadaCarleton Don't you think that this PR is a strict improvement over the current situation, even if we decide to split AnalyticWeights into several types?

@ParadaCarleton
Copy link
Contributor

I think it's an improvement, yeah, but I'd clarify that the weights:

  1. Refer specifically to sample sizes for each observation.
  2. I'd add a warning about std doing something different from what you'd expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants